Sampling Content Distributed Over Graphs

نویسندگان

  • Pinghui Wang
  • Junzhou Zhao
  • John C. S. Lui
  • Donald F. Towsley
  • Xiaohong Guan
چکیده

Despite recent effort to estimate topology characteristics of large graphs (i.e., online social networks and peer-to-peer networks), little attention has been given to develop a formal methodology to characterize the vast amount of content distributed over these networks. Due to the large scale nature of these networks, exhaustive enumeration of this content is computationally prohibitive. In this paper, we show how one can obtain content properties by sampling only a small fraction of vertices. We first show that when sampling is naively applied, this can produce a huge bias in content statistics (i.e., average number of content duplications). To remove this bias, one may use maximum likelihood estimation to estimate content characteristics. However our experimental results show that one needs to sample most vertices in the graph to obtain accurate statistics using such a method. To address this challenge, we propose two efficient estimators: special copy estimator (SCE) and weighted copy estimator (WCE) to measure content characteristics using available information in sampled contents. SCE uses the special content copy indicator to compute the estimate, while WCE derives the estimate based on meta-information in sampled vertices. We perform experiments to show WCE and SCE are cost effective and also “asymptotically unbiased”. Our methodology provides a new tool for researchers to efficiently query content distributed in large scale networks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing Distributed Fixed-Time Consensus Protocols for Linear Multi-Agent Systems Over Directed Graphs

This technical note addresses the distributed fixed-time consensus protocol design problem for multi-agent systems with general linear dynamics over directed communication graphs. By using motion planning approaches, a class of distributed fixed-time consensus algorithms are developed, which rely only on the sampling information at some sampling instants. For linear multi-agent systems, the pro...

متن کامل

Adaptive Graph Signal Processing: Algorithms and Optimal Sampling Strategies

The goal of this paper is to propose novel strategies for adaptive learning of signals defined over graphs, which are observed over a (randomly time-varying) subset of vertices. We recast two classical adaptive algorithms in the graph signal processing framework, namely, the least mean squares (LMS) and the recursive least squares (RLS) adaptive estimation strategies. For both methods, a detail...

متن کامل

Stratified sampling for even workload partitioning applied to single source shortest path algorithm

An efficient implementation of large graph processing algorithms on distributed-memory machines requires a balanced partitioning of the graph across the machines. In a previous paper we presented an algorithm, named Workload Partitioning and Scheduling (WPS), that uses domainspecific knowledge to guide a sampling procedure in large implicitly-defined graphs. WPS’s sampling procedure is used for...

متن کامل

Sampling from complex networks using distributed learning automata

A complex network provides a framework for modeling many real-world phenomena in the form of a network. In general, a complex network is considered as a graph of real world phenomena such as biological networks, ecological networks, technological networks, information networks and particularly social networks. Recently, major studies are reported for the characterization of social networks due ...

متن کامل

Efficient community identification and maintenance at multiple resolutions on distributed datastores

Article history: Received 16 September 2014 Received in revised form 6 May 2015 Accepted 2 June 2015 Available online 16 June 2015 The topic of network community identification at multiple resolutions is of great interest in practice to learn high cohesive subnetworks about different subjects in a network. For instance, one might examine the interconnections among web pages, blogs and social co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1311.3882  شماره 

صفحات  -

تاریخ انتشار 2012